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Abstract 

Deep neural networks is a branch in machine learning that has seen a meteoric rise in popularity due to its pow¬ 
erful abilities to represent and model high-level abstractions in highly complex data. One area in deep neural 
networks that is ripe for exploration is neural connectivity formation. A pivotal study on the brain tissue of 
rats found that synaptic formation for specific functional connectivity in neocortical neural microcircuits can 
be surprisingly well modeled and predicted as a random formation. Motivated by this intriguing finding, we 
introduce the concept of StochasticNet, where deep neural networks are formed via stochastic connectivity be¬ 
tween neurons. As a result, any type of deep neural networks can be formed as a StochasticNet by allowing the 
neuron connectivity to be stochastic. Stochastic synaptic formations, in a deep neural network architecture, can 
allow for efficient utilization of neurons for performing specific tasks. To evaluate the feasibility of such a deep 
neural network architecture, we train a StochasticNet using four different image datasets (ClEAR-10, MNIST, 
SVHN, and STL-10). Experimental results show that a StochasticNet, using less than half the number of neural 
connections as a conventional deep neural network, achieves comparable accuracy and reduces overfitting on 
the CIEAR-10, MNIST and SVHN dataset. Interestingly, StochasticNet with less than half the number of neural 
connections, achieved a higher accuracy (relative improvement in test error rate of ~6% compared to ConvNet) 
on the STL-10 dataset than a conventional deep neural network. Einally, StochasticNets have faster operational 
speeds while achieving better or similar accuracy performances. 


1 Introduction 

Deep neural networks is a branch in machine learning that has seen a meteoric rise in popularity due to its powerful abilities 
to represent and model high-level abstractions in highly complex data. Deep neural networks have shown considerable capa¬ 
bilities for handling specific complex tasks such as speech recognition mill, object recognition M, and natural language 
processing HIlll. Recent advances in improving the performance of deep neural networks have focused on areas such as network 
regularization ll9l [T0ll . activation functions lfTTHT3l . and deeper architectures ||§l[T4l[l5l . However, the neural connectivity forma¬ 
tion of deep neural networks has remained largely the same over the past decade and thus further exploration and investigation on 
alternative approaches to neural connectivity formation can hold considerable promise. 

To explore alternate deep neural network connectivity formation, we take inspiration from nature by looking at the way brain 
develops synaptic connectivity between neurons. Recently, in a pivotal paper by Hill et al. HU, data of living brain tissue from 
Wistar rats was collected and used to construct a partial map of a rat brain. Based on this map. Hill et al.came to a very surprising 
conclusion. The synaptic formation, of specific functional connectivity in neocortical neural microcircuits, can be modelled and 
predicted as a random formation. In comparison, for the construction of deep neural networks, the neural connectivity formation 
is largely deterministic and pre-defined. 

Motivated by Hill et al.’s finding of random neural connectivity formation, we aim to investigate the feasibility and efficacy 
of devising stochastic neural connectivity formation to construct deep neural networks. To achieve this goal, we introduce the 
concept of StochasticNet, where the key idea is to leverage random graph theory iiniiii to form deep neural networks via 
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Figure 1; An illustrative example of a random graph. All possible edge connectivity between the nodes in the graph may occur 
independently with a probability of pij. 


stochastic connectivity between neurons. As such, we treat the formed deep neural networks as particular realizations of a random 
graph. Such stochastic synaptic formations in a deep neural network architecture can potentially allow for efficient utilization 
of neurons for performing specific tasks. Furthermore, since the focus is on neural connectivity, the StochasticNet architecture 
can be used directly like a conventional deep neural network and benefit from all of the same approaches used for conventional 
networks such as data augmentation, stochastic pooling, and Dropout llT9l . and DropConnect ll20ll . 

While a number of stochastic strategies for improving deep neural network performance have been previously introduced lfT9l42T1l . 
it is very important to note that the proposed StochasticNets is fundamentally different from these existing stochastic strategies in 
that StochasticNets’ main significant contributions deals primarily with the formation of neural connectivity of individual neurons 
to construct efficient deep neural networks that are inherently sparse prior to training, while previous stochastic strategies deal 
with either the grouping of existing neural connections to explicitly enforce sparsity ED, or removal/introduction of neural 
connectivity for regularization during training. More specifically, StochasticNets is a realization of a random graph formed prior 
to training and as such the connectivity in the network are inherently sparse, and are permanent and do not change during 
training. This is very different from Dropout lfT9l and DropConnect ll20l where the activations and connections are temporarily 
removed during training and put back during test for regularization purposes only, and as such the resulting neural connectivity 
of the network remains dense. There is no notion of ’dropping’ in StochasticNets as only a subset of possible neural connections 
are formed in the first place prior to training, and the resulting network connectivity of the network is sparse. 

StochasticNets are also very different from HashNets ED, where connection weights are randomly grouped into hash buckets, 
with each bucket sharing the same weights, to explicitly sparsifying into the network, since there is no notion of grouping/merging 
in StochasticNets; the formed StochasticNets are naturally sparse due to the formation process. In fact, stochastic strategies such 
as HashNets, Dropout, and DropConnect can be used in conjunction with StochasticNets. 

The paper is organized as follows. First, a review of random graph theory is presented in Section 2. The theory and design 
considerations behind forming StochasticNet as a random graph realizations are discussed in Section 3. Experimental results using 
four image datasets (CIFAR-10 ll24ll . MNIST ||251 . SVHN E6\ . and STL-10 ll27l ) to investigate the efficacy of StochasticNets 
with respect to different number of neural connections as well as different training set sizes is presented in Section 5. Finally, 
conclusions are drawn in Section 6. 


2 Review of Random Graph Theory 

In this study, the goal is to leverage random graph theory CllIISl to form the neural connectivity of deep neural networks in a 
stochastic manner. As such, it is important to first provide a general overview of random graph theory for context. In random 
graph theory, a random graph can be defined as the probability distribution over graphs ll22l . A number of different random graph 
models have been proposed in literature. 

A commonly studied random graph model is that proposed by Gilbert El , in which a random graph can be expressed by Q {n, p), 
where all possible edge connectivity are said to occur independently with a probability of p, where 0 < p < 1. This random 
graph model was generalized by Kovalenko ll23l . in which the random graph can be expressed by Q{V,pij), where V is a set 
of vertices and the edge connectivity between two vertices {i,j} in the graph is said to occur with a probability of pij, where 
0 < Pij < 1. An illustrative example of a random graph based on this model is shown in Figure [T] It can be seen that all possible 
edge connectivity between the nodes in the graph may occur independently with a probability of pij. 

Therefore, based on this generalized random graph model, realizations of random graphs can be obtained by starting with a 
set of n vertices V = {vq \1 > q > n} and randomly adding a set of edges between the vertices based on the set of possible 
edges S = {eij\l >i>n,l>j>n} independently with a probability of pij. A number of realizations of the random 
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Figure 2; Realizations of random graph in Figure[T] The probability for edge connectivity between all nodes in the graph was set 
to pij =0.1 for all nodes i and j. Each diagram demonstrates a different realization of the random graph. 
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Figure 3: Example random graph representing a general deep feed-forward neural network. Every neuron k in layer i may be 
connected to neuron h in layer j with probability p[]^f^] based on random graph theory. To enforce the properties of a general 
deep feed-forward neural network, p[]^f^] = 0 when i = j \ \\i — j\ > 2. 

graph in Eigure[T]are provided in Eigure|2]for illustrative purposes. It is worth noting that because of the underlying probability 
distribution, the generated realizations of the random graph often exhibit differing edge connectivity. 

Given that deep neural networks can be fundamentally expressed and represented as graphs Q, where the neurons are vertices V 
and the neural connections are edges 8, one intriguing idea for introducing stochastic connectivity for the formation of deep neural 
networks is to treat the formation of deep neural networks as particular realizations of random graphs, which we will describe in 
greater detail in the next section. 


3 StochasticNets: Deep Neural Networks as Random Graph Realizations 


Let us represent the full network architecture of a deep neural network as a random graph G{V, p[]^f^]), where V is the the set of 
neurons V = > i > ni,l > k > rrii}, with Vi^k denoting the k*'^ neuron at layer i, ni denoting the number of layers, rrii 

denoting the number of neurons at layer i, and p[]^i^] is the probability that a neural connection occurs between neuron and 

Based on the above random graph model for representing deep neural networks, one can then form a deep neural network as 
a realization of the random graph G{V,p[]^i^]) by starting with a set of neurons V, and randomly adding neural connections 
between the set of neurons independently with a probability of p[]^k] defined above. 

While one can form practically any type of deep neural network as a random graph realizations, an important design consideration 
for forming deep neural networks as random graph realizations is that different types of deep neural networks have fundamental 
properties in their network architecture that must be taken into account and preserved in the random graph realization. Therefore, 
to ensure that fundamental properties of the network architecture of a certain type of deep neural network is preserved, the 
probability niust be designed in such a way that these properties are enforced appropriately in the resultant random graph 

realization. Let us consider a general deep feed-forward neural network. Eirst, in a deep feed-forward neural network, there can 
be no neural connections between non-adjacent layers. Second, in a deep feed-forward neural network, there can be no neural 
connections between neurons on the same layer. Therefore, to enforce these two properties, p[‘i^iJ = 0 when i = j \ \\i — j\ > 2. 
An example random graph based on this random graph model for representing general deep feed-forward neural networks is 
shown in Eigure[3 with an example realization of the random graph shown in EigurelH It can be observed in Eigure |4] that the 
neural connectivity for each neuron may be different due to the stochastic nature of neural connection formation. 

Eurthermore, for specific types of deep feed-forward neural networks, additional considerations must be taken into account to 
preserve their properties in the resultant random graph realization. Eor example, in the case of deep convolutional neural networks, 
neural connectivity in the convolutional layers are arranged such that small spatially localized neural collections are connected 
to the same output neuron in the next layer. Eurthermore, the weights of the neural connections are shared amongst different 
small neural collections. A significant benefit to this architecture is that it allows neural connectivity at the convolutional layers 
to be efficiently represented by a set of local receptive fields, thus greatly reducing memory requirements and computational 
complexity. To enforce these properties when forming deep convolutional neural networks as random graph realizations, one can 
further enforce the probability p[]^i^] such that the probability of neural connectivity is defined at a local receptive field level. 
As such, the neural connectivity for each randomly realized local receptive field is based on a probability distribution, with the 
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Figure 4: An example realization of the random graph shown in Figure[3] In this example, = 0.5 for all neurons except 

when i = j II |i — j| > 2. It can be observed that the neural connectivity for each neuron may be different due to the stochastic 
nature of neural connection formation. The connectivity for the red neuron and the green neuron are highlighted to show the 
differences in neural connectivity. 



Figure 5: Forming a deep convolutional neural network from a random graph. The neural connectivity for each randomly realized 
local receptive field {ATi, K 2 } are determined based on a probability distribution, and as such the configuration and shape of each 
randomly realized local receptive field may differ. It can be seen that the shape and neural connectivity for local receptive field 
Ki is completely different from local receptive field K 2 . The response of each randomly realized local receptive held leads to 
an output in new channel C. Only one layer of the formed deep convolutional neural network from a random graph is shown for 
illustrative purposes. 


neural connectivity conhguration thus being shared amongst different small neural collections for a given randomly realized local 
receptive held. 

Given this random graph model for representing deep convolutional neural networks, the resulting random graph realization is 
a deep convolutional neural network where each convolutional layer consists of a set of randomly realized local receptive helds 
K, with each randomly realized local receptive held Ki^k, which denotes the receptive held at layer i, consisting of neural 
connection weights of a set of random neurons within a small neural collection to the output neuron. An example of a realization 
of a deep convolutional neural network from a random graph is shown in Figure|5] 


4 Experimental Results 

4.1 Experimental Setup 

To investigate the efficacy of StochasticNets, we construct StochasticNets with a deep convolutional neural network architecture 
and evaluate the constructed StochasticNets in a number of different ways. First, we investigate the effect of the number of neural 
connections formed in the constructed StochasticNets on its performance for the task of image object recognition. Second, we 
investigate the performance of StochasticNets when compared to baseline deep convolutional neural networks (which we will 
simply refer to as ConvNets) with standard neural connectivity for different image object recognition tasks based on different 
image datasets. Third, we investigate the relative speed of StochasticNets during classihcation with respect to the number of 
neural connections formed in the constructed StochasticNets. It is important to note that the main goal is to investigate the 
efficacy of forming deep neural networks via stochastic connectivity in the form of StochasticNets and the influence of stochastic 
connectivity parameters on network performance, and not to obtain maximum absolute performance; therefore, the performance 
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Figure 6: Training and test error versus the number of neural connections in convolutional layers and fully connected layers for 
the CIFAR-10 dataset. Both Gaussian distributed and uniform distributed neural connectivity were evaluated. Note that neural 
connectivity percentage of 100 is equivalent to ConvNet, since all connections are made. 


of StochasticNets can be further optimized through additional techniques such as data augmentation and network regularization 
methods. For evaluation purposes, four benchmark image datasets are used; CIFAR-10 ll24l . MNIST ||25]| . SVHN ll^ . and 
STL-10 ET). a description of each dataset and the StochasticNet configuration used are described below. 

4.1.1 Datasets 

The CIFAR-10 image dataset ll24l consists of 50,000 training images categorized into 10 different classes (5,000 images per class) 
of natural scenes. Each image is an RGB image that is 32x32 in size. The MNIST image dataset ll25l consists of 60,000 training 
images and 10,000 test images of handwritten digits. Each image is a binary image that is 28 x28 in size, with the handwritten 
digits are normalized with respect to size and centered in each image. The SVHN image dataset ll26l consists of 604,388 training 
images and 26,032 test images of digits in natural scenes. Each image is an RGB image that is 32x32 in size. The images in 
the MNIST dataset were resized to 32 x 32 by zero padding since the same StochasticNet network configuration is utilized for 
all mentioned image datasets. Einally, the STL-10 image dataset ETl consists of 5,000 labeled training images and 8,000 labeled 
test images categorized into 10 different classes (500 training images and 800 training images per class) of natural scenes. Each 
image is an RGB image that is 96x96 in size. Note that the 100,000 unlabeled images in the STL-10 image dataset were not used 
in this study. 

4.1.2 StochasticNet Configuration 

The StochasticNets used in this study for the all datasets are realized based on the LeNet-5 deep convolutional neural network Il25l 
architecture, and consists of 3 convolutional layers with 32, 32, and 64 local receptive fields of size 5x5 for the first, second, and 
third convolutional layers, respectively, and 1 hidden layer of 64 neurons, with all neural connections in the convolutional and 
hidden layers being randomly realized based on probability distributions. While it is possible to take advantage of any arbitrary 
distribution to construct StochasticNet realizations, for the purpose of this study the neural connection probability of the hidden 
layers follow a uniform distribution, while two different spatial distributions were explored for the convolutional layers; i) uniform 
distribution, and ii) a Gaussian distribution with the mean at the center of the receptive field and the standard deviation being a 
third of the receptive field size. All image datasets are with 10 class label outputs which is provided in the network setup. 

4.2 Number of Neural Connections 

An experiment was conducted to illustrate the impact of the number of neural connections on the modeling accuracy of Stochastic¬ 
Nets. Figure|6]demonstrates the training and test error versus the number of neural connections in the network for the CIFAR-10 
dataset. A StochasticNet with the network configuration as described in Section l4.L2l was provided to train the model. The neural 
connection probability is varied in both the convolutional layers and the hidden layer to achieve the desired number of neural 
connections for testing its effect on modeling accuracy. 

Figure |6] demonstrates the training and testing error vs. the neural connectivity percentage relative to the baseline ConvNet, for 
two different neural connection distributions; i) uniform distribution, and ii) a Gaussian distribution with the mean at the center of 
the receptive field and the standard deviation being a third of the receptive field size. It can be observed that StochasticNet is able 
to achieve the same test error as ConvNet when the number of neural connections in the StochasticNet is less than half that of the 
ConvNet. It can be also observed that, although increasing the number of neural connections resulted in lower training error, it 
does not not exhibit reductions in test error, which brings to light the issue of over-fitting. In other words, it can be observed that 
the proposed StochasticNets can improve the handling of over-fitting associated with deep neural networks while decreasing the 
number of neural connections, which in effect greatly reduces the number of computations and thus resulting in faster network 
training and usage. Finally, it is also observed that there is a noticeable difference in the training and test errors when using 
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Gaussian distributed connectivity when compared to uniform distributed connectivity, which indicates that the choice of neural 
connectivity probability distributions can have a noticeable impact on model accuracy. 
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Figure 7: Comparison between a standard ConvNet and a StochasticNet with 39% of neural connectivity as the ConvNet. For 
StochasticNets, the results shows the error based on 25 trials since the neural connectivity of StochasticNets are realized stochas¬ 
tically. The dashed line demonstrates the standard deviation of error based on 25 trials for StochasticNets. 


4.3 Comparisons with ConvNet 

Motivated by the results shown in Figure |6] a comprehensive experiment were done to demonstrate the performance of the 
proposed StochasticNets on different benchmark image datasets. StochasticNet realizations were formed with 39% neural con¬ 
nectivity via Gaussian-distributed connectivity when compared to a conventional ConvNet. The StochasticNets and ConvNets 
were trained on four benchmark image datasets (i.e., CIFAR-10, MNIST, SVHN, and STL-10) and their training and test error 
performances are compared to each other. Since the neural connectivity of StochasticNets are realized stochastically, the perfor¬ 
mance of the StochasticNets was evaluated based on 25 trials (leading to 25 StochasticNet realizations) and the reported results 
are based on the average of the 25 trials. Figure Q shows the training and test error results of the StochasticNets and ConvNets on 
the four different tested datasets. It can be observed that, despite the fact that there are less than half as many neural connections 
in the StochasticNet realizations, the test errors between ConvNets and the StochasticNet realizations can be considered to be the 
same for CIFAR-10, MNIST, and SVHN datasets. Interestingly, it was also observed that the test errors for the StochasticNet 
realizations is lower than that achieved using the ConvNet (relative improvement in test error rate of '^6% compared to ConvNet) 
for the STL-10 dataset, again despite the fact that there are less than half as many neural connections in the StochasticNet realiza¬ 
tions. The results for the STL-10 dataset truly illustrates the particular effectiveness of StochasticNets, particularly when dealing 
with low number of training samples. 

Furthermore, the gap between the training and test errors of the StochasticNets is less than that of the ConvNets, which would 
indicate reduced overfitting in the StochasticNets. The standard deviation of the 25 trials for each error curve is shown as dashed 
lines around the error curve. It can be observed that the standard deviation of the 25 trials is very small and indicates that the 
proposed StochasticNet exhibited similar performance in all 25 trials. 
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Figure 8: Relative classification time versus the number of neural connections. Note that neural connectivity percentage of 100 is 
equivalent to ConvNet, since all connections are made. 

4.4 Relative Speed vs. Number of Neural Connections 

Given that the experiments in the previous sections show that StochasticNets can achieve good performance relative to conven¬ 
tional ConvNets while having significantly fewer neural connections, we now further investigate the relative speed of Stochas¬ 
ticNets during classification with respect to the number of neural connections formed in the constructed StochasticNets. Here, 
as with Section l4~2l the neural connection probability is varied in both the convolutional layers and the hidden layer to achieve 
the desired number of neural connections for testing its effect on the classification speed of the formed StochasticNets. Figure[8] 
demonstrates the relative classification time vs. the neural connectivity percentage relative to the baseline ConvNet. The relative 
time is defined as the time required during the classification process relative to that of the ConvNet. It can be observed that the 
relative time decreases as the number of neural connections decrease, which illustrates the potential for StochasticNets to enable 
more efficient classification. 


5 Conclusions 

In this study, we introduced a new approach to deep neural network formation inspired by the stochastic connectivity exhibited 
in synaptic connectivity between neurons. The proposed StochasticNet is a deep neural network that is formed as a realization of 
a random graph, where the synaptic connectivity between neurons are formed stochastically based on a probability distribution. 
Using this approach, the neural connectivity within the deep neural network can be formed in a way that facilitates efficient neural 
utilization, resulting in deep neural networks with much fewer neural connections while achieving the same modeling accuracy. 
The effectiveness and efficiency of the proposed StochasticNet was evaluated using four popular benchmark image datasets 
and compared to a conventional convolutional neural network (ConvNet). Experimental results demonstrate that the proposed 
StochasticNet provides comparable accuracy as the conventional ConvNet with much less number of neural connections while 
reducing the overfitting issue associating with the conventional ConvNet for CIFAR-10, MNIST, and SVHN datasets. More 
interestingly, a StochasticNet with much less number of neural connections was found to achieve higher accuracy when compared 
to conventional deep neural networks for the STL-10 dataset. As such, the proposed StochasticNet holds great potential for 
enabling the formation of much more efficient deep neural networks that have fast operational speeds while still achieving strong 
accuracy. 
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